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ABSTRACT 

This paper describes several pilot systems of data 
management using telecommunications links, which have been tested by 
the Navy during an 8-year period in which emphasis has been on the 
development of relational database management systems, exchange 
protocols, and man-machine interface. An introduction discusses the 
background of the project, which began as an attempt to computerize 
natural resource and environmental survey data for Navy-contralled 
United States land. The three prototype management systems described 
were developed because of the multidisciplinary character of the data 
and the diversity of the data uses. The fundamental problems of 
taxonomy, habitual procedures, and reliability are addressed. 
Emphasis is on the natural scientist as a computerized^system user, 
the user interface, and data exchange applications. An expanded 
database management system currently under development is also 
briefly described. (LMM) 
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Evans, abstract 



As the custodian of nearly 3% million acres of land 1n the United States & 
as responsible tenant on k million additional acres 1n many foreign coun- 
tries, the U.S. Navy has an Important requirement for efficient natural re- 
source management & for environmental quality control. The Immense geo- 
graphic spread of these areas & the need for long-term t1me-ser1es compari- 
sons 1n bdth natural resource & environmental management dictate an effici- 
ent means of data storage, manipulation, & exchange. In consequence, the 
U.S. Navy Has tested several systems of data management & data exchange 
using telecommunication links. Special emphasis has been placed on the 
development of relational database management systems, on exchange proto- 
cols, & on the man/machine Interface. A thorough understanding of this In- 
terface & of the practical applications required by the user are paramount 
to the success of any data-exchange network. This paper describes several 
pilot systems which have been tested over the last eight years. The fund- 
amental problems of taxonomy, habitual procedures, & reliability are ad- 
dressed. The emphasis 1s on the user Interface & on the applications that 
efficient data-exchange makes possible. An expanded database management 
system, currently under development, 1s also briefly described. 



9 

ERIC 



3 



Establishing Data-Exchange Networks through 
Data Management & Telecommunications 

Dr. Evan C. Evans III 
Naval Ocean Systems Center/Hawaii Lab 
Kaneohe Bay, Hawaii 96863 



Introduction 

As the operator of ships, submarines, aircraft, and landbased fa- 
cilities on a global scale, the U.S. Navy clearly has a requirement for 
.sophisticated, efficient data management and for data exchange through 
telecommunication. The 8-year project described here began as an at- 
tempt to computerize natural resources and environmental survey data for 
the 3*s million acres of land controlled by the Navy within the United 
States (Hura, 1976; Evans, 1977a). The multl disciplinary character of 
these data and the extreme diversity (both 1n operational requirements 
and 1n geographic location) of the data users forced the development of 
a generalized, relational data management system. Since most expertise 
In the natural sciences 1s found on university campuses, 1n museums, or 
In organizations (both public and private) outside the Navy, the data 
management system that evolved was expressly tailored to facilitate 
strong Interaction with these "outside" sources. An Important aspect of 
this project has been an overt attempt to entrain Individual users and 
their observations Into the system through the excellence and af ford- 
ability of the data management service provided. The generality of the 
relational data management systems so far developed has permitted their 
effective use In many other fields, such as meteorology, microelectronic 
component properties, technology transfer, conference administration, 
chemical dceanography. Three prototype data management systems have 
been developed and tested, the las^ of which (R*B-2.4) 1s currently 
operational. The project 1s continuing with the development of R*B-3, 
the first full-function data management system, expected to become 
operational 1n 1985. 

The Pilot Systems 

The current project evolved out of a Navy biological survey of 
Pearl Harbor, Hawaii (Evans, 1974). Computer analysis of this survey 
data showed that similar "ship signatures" could be detected 1n Hawaiian 
and a number of west coast harbors, and showed further that such anal- 
ysis applied to the observations of others could reveal blotope patterns 
not recognized by the original observers (Evans, 1977b). These discov- 
eries led to a search for other harbor survey data that might corrobor- 
ate these findings. At that time, Navy data was archived 1n the Univer- 
sity of Hawaii's Hawaii Coastal Zone Data Bank (HCZDB), a file manage- 
ment system using PANVALET. While the HCZDB was adequate for those 
familiar with Its contents, Its lack of a data management system ren- 
dered 1t quite unsuitable as a generalized database that could be 
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shared. Since 1975 the Naval Ocean Systems Center* and the Computer 
Sciences Corporation (under contract), have collaborated 1n developing 
the type of data management system required by the Navy. This effort 
coirmenced with an evaluation of data management systems extant 1n the 
1975-77 time frame to find one suitable for the kinds and amounts of 
data being collected. 

Many data management systems (among them ENVIR, TAXIR, B1o-ST0RET, 
System-2000, UPGRADE, DMS-1100) were examined. None could accommodate } 
1n an adequate and affordable manner the wide range of mult1d1sc1pl1nary I 
measurements characteristic- of environmental survey data. At that time, ^ 
no relational data management system existed, although two (System-R and* 
INGRES) were 1n the early stages of development. Furthermore, most data 
obtained from other harbor surveys could not be used because of Inade- 
quate supporting Information. The latter situation leads to the oft 
heard statement: other people's data are no damn good. This statement 
1s Inaccurate. Verified scientific measurements have lasting value 1f 
they can be marshalled for the right application with a full set of sup- 
porting Information. Usually 1t 1s the absence of necessary supporting 
Information that disqualifies otherwise useful data obtained from out- 
side sources. All our findings substantiated a definite need to develop 
a data management system that could be shared with equal facility by 
different scientific disciplines. Hierarchical data management systems 
were obviously Inadequate for such mult1d1sc1pl1nary application. Thus, 
the decision was made 1n 1977 to follow the relational theory recently 
advanced by E. F. Codd (Codd, 1970). 

At that time, the penalties for selecting a relational approach 
loomed large. Chief among these were the sequential search requirement 
and the repetition of ancillary or supporting Information 1n each tuple 
(record). Proof of high search-rate capability and of effective data- 
compression was paramount to the success of the relational approach. 
From the beginning our prime goal was the management of very large data 
bases at a cost that was affordable to universities and museums, the 
principal sources of verifiable environmental observations. To assure 
that all necessary supporting Informatfon was correctly associated with 
an observation regardless of the discipline or circumstances under which 
it was made, we adopted the concept of a data template, see Figure 1. 
This template, developed 1n 1976 and still 1n use, has been tested 
against many different types of observations (scientific and otherwise). 
It has proven entirely adequate for our data management applications. 
The first data management System, called BIODAB for Biological Data- 
Base, became operational 1n APR78 (Key, 1979). It was built to deter- 
mine three things: 

* the adequacy of the data template as a discipline-Indepen- 

dent vehicle for scientific observations 

* the recovery times for complex searches directly on data 

stored 1n relational format 

* the data compression obtainable using various coding or 

linkage techniques. 
The results of the BIODAB test were positive on all three scores. As 
said above the data template proved wholly adequate. Rates of 300,000 
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records per CPU-second* were obtained for complex searches directly on 
data. BIODAB tuples could be compressed from 56 words to less than 6, 
see Figure 2. The high Search rates were obtained by means of the 
masked search Instruction available on UNIVAC machines. While Instruc- 
tions emulating the UNIVAC masked search can be written for other main- 
frame computers, our data management systems continue to be specific 
for the UNIVAC 1100 series. The philosophy of the project 1s to run a 
given system on the machine that 1s optimal for the processes Involved 
and to bring the user Into contact with that machine through telecoimiun- 
1 cation. 

BIODAB was tested for two years and then retired. During that test- 
period, Its better features were Incorporated modularly Into the first 
of the RELATABASE or R*B systems, see Figure 3. As Indicated 1n this 
schematic, the development of all follow-on data management systems was 
driven by strong Interaction with the user community employing an exist- 
ing prototype 1n real job situations. This Interactive aspect of system 
development 1s essential to the success of any data management system. 
The user community must be created simultaneously with the data manage- 
ment system Itself. Note also that the same data management system was 
used for several quite different data bases (the Oceans '79 Conference 
^database, the Integrated Circuit DataBase, the Natural Resources Data- 
Base). Further discussion of meaningful Involvement with the user com- 
munity, the man/machine Interface, and data structure follows 1n the 
next section. 

Because the R*B systems were developed primarily for archiving, 
manipulating, and sharing or exchanging numeric data, each Included a 
statistical processor that permitted Interactive data analysis 1n the 
sense of John Tukey (Tukey, 1977). This processor would permit any user 
employing the numeric observations of another to probe or shape his new- 
ly-acquired file through Interactive analysis before^applylng more so- 
phisticated statistical treatments, Hke factor analysis. The Import- 
ance of such probing has been stressed by J. Stuart Hunter (Hunter, 
1980). BIODAB contained a partial Implementation of Don McNeil's Inter- 
active data analysis programs (McNeil, 1977). Such simple displays as 
stemleafs, boxplots, scatter plots, and regressions to the third power 
were possible. Follow-on data management systems (R*B-1 and R*B-2) Im- 
proved or enhanced these Interactive capabilities. Futhermore, BIODAB 
was not a strictly relational system. Its 18-character taxonomlc code 
(see Figure 4), while fully capable of accommodating Latin names, common 
names, and synonyny for all living organisms, was hierarchical, a format 
not permitted 1n relational systems. R*B-1 development Involved two 
major efforts, viz: 

* taking all useful features of the BIODAB design and further gen- 

eralizing them so that they were strictly relational, and 

* designing and Implementing means whereby any user could create 

his own database (BIODAB did not have this capability) . 



*CPU-second ■ Central Processing Unit-second or machine-second. 
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RELATABASE, version 1 or R*B-1, development Included the design and 
implementation of processors to permit a user to: 

* define his own database, 

* Insert records Into the database so defined, 

* remove selected records from that database, 

* update selected records 1n that database, and 

* unload or move part or all of that database to another 

file. 

In addition, an editing capability was added to the search and report- 
writer processors. The report-writer was- also enhanced by the addition 
of sorting and listing options. R*B-1 became operational 1n JUN79. 

Several engineering groups were attracted to the R*B-1 system with 
the result that the R*B project lost iti predominantly environmental 
cast. The 1n1t1a<l environmental slant had, however, served a definite 
purpose. To test a generalized data management system, one must have 
both complex mul t1d1sc1p!1nary problems and a good supply of different 
but fairly well organized data sets. Many scientific disciplines have 
complex data management problems, but often available sets of organized 
data tend to be lacking. Environmental studies and surveys offered 
taxonomlc complexity, convoluted and overlapping geographic and juris- 
dictional boundaries, and "constants*" that change as a function of 
location. The engineers soon found certain enhancements to R*B-1 to be 
highly desirable. They were accomodated by a series of modifications 
culminating 1n R*B-1.4, while a full revision, R*B-2, was being Imple- 
mented. The enhancements available 1n R*B-2 were: 

* optimized search routines to achieve higher search speeds, 

* surrogate link values making record Insertion easier and cheaper, 

* use monitors to collect operating data on various R*B processors 

and to provide more detailed cost breakdowns, 

* a text attribute so that text or long comments could be stored, 

* a 11st directive so that new or Intermittent users could re- 

fresh their memories on the contents of any relation, 

* a menu option to prompt new or Intermittent users Inputting data, 

* real number representation (not Implemented 1n R*B-1.4) 

* further Improvements to the stats processor, such as adding In- 

trinsic functions and an equation processor. 
R*B-2 became fully operational 1n APR81; the current modification R*B2.4 
was released AUG82. Search rates 1n this version were clocked at be- 
tween 500,000 and 800,000 records per CPU second. Details of the sys- 



* For example, the bald eagle 1s endangered or protected or both depend- 
ing on Its location. Its classification can change as 1t flies across 
state or county lines. Classification also depends on whether the bird 
1s considered as a species or as a raptor. This curious sort of vari- 
ation, resulting from different laws and their Interpretation, repre- 
sents a problem for the Navy as well as a real challenge to the design- 
er of data management systems. 
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tern are described elsewhere (Key, 1979; NOSC, 1982a; NOSC, 1982b). 
Briefly, any R*B user can create (define), maintain (update, Insert 
data, remove data), and unload Individual databases. R*B also main- 
tains Individual user files which can be displayed, described, labeled, 
or deleted. Any major relation 1n any master database can be searched 
for specific values and the material so retrieved stored 1n a file as- 
signed to the Individual user. A report-writer (permitting a wide range 
of format specifications) and a statistical processor 1s also available. 
R*B also has provisions for self-tutorial help and for sending messages 
or bulletins. All versions of R*B currently operational are considered 
prototypes. A full-function data management system R*B-3, discussed 1n 
the final section, 1s currently 1n the definition phase of development. 

Since a data management system, of Itself, contains no data, an ap- 
plication of R*B-2.4 to the Natural Resources DataBase (NRDB) 1s briefly 
described. The NRDB was established to manage the records of the Navy's 
natural resource managers, who are widely distributed among many Navy 
facilities throughout the continental United States and Hawaii. Their 
concerns Involve 3*s million acres of land (Including 96 thousand acres 
of ponds, streams, and wetlands, and 80 thousand acres of forest 1n 
timber production) and around 2 million civilian guests per year, who 
hunt or hike or perform scientific studies on Navy land. Their re- 
cords Include, but are not limited to, such disciplines as: agriculture, 
archaeology, biological survey, chemistry, cultural registration and 
restoration, endangered species protection, erosion control, forestry, 
historic preservation, hydrology, geophysics, grazing regulation, land 
use, management plan development, meteorology, outlease Inspection, pol- 
lution monitoring and prevention, recreation control and development, 
resource management, soil analysis, timber surveys, vegetation mapping, 
well logging, and wildlife management. The many Individuals 1n the work 
force employ different methodologies, data formats, and filing systems. 
Certainly, the application 1s a challenge to any data management system. 
The NRDB 1s comprised of four major relations (tables), viz: OBSERVATION, 
CLASSIFY, USAGE, and EVENT, and of five support relations, viz: SOURCES, 
CONTACTS, TAXON, METHODS, and GLOSSARY. The details of all these rela- 
tions and lists of attribute values contained in any of them are stored 
1n the system and may be called for at any time. An overview of data 
types 1n the NRDB 1s given 1n Figure 5. Currently, the NRDB contains 
about 1 million records, each containing many Items (a mean of 41 for the . 
major relations and of 17 for the support relations). Its size 1s doubling 
annually and 1s expected to approach 6 million records by the time R*B-3 
becomes operational. 

User Interaction 

As mentioned above, strong Interaction with a user community 1s an 
essential aspect of the development of effective data management systems. 
Since Its Inception, user Interaction has been an Important part of this 
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project; quantitative study of user activity, however, commenced 1n 
1979. Many things are Involved, Including telecommunl cations, network 
protocols, reliability, man/machine Interfaces, user behavior, natural 
language, data structure. Only a few of these subjects can be touched 

?n here. Fortunately, good reports of user Interaction exist (rflltz & 
uroff, 1978; Vallee, 1978; Johansen 1978). HUtz and Turoff's excel- 
lent summary of what system designers must expect should always be borne 
1n mind. Users will : 

* fall to notice even the most explicit Instructions 

* do the unexpected, the unanticipated, and the forbidden 

* disregard or forget 1nstuct1ons 

* often fall to ask for help when they need 1t 

* form opinions based on Inadequate knowledge 

* use the system only 1f 1t benefits them. ' 

H1ltz 4 Turoff (p 61) emphasize the crucial Importance of a user-ori- 
ented monitor, providing 1n-person or telephone training and serving as 
a point of contact wit h£ system designers or operators. They also de- 
scribe (pp 46-61) the animosity of established ADP* groups to the devel- 
opment of new computerized systems. This project has had exactly these 
same experiences. The** Importance of one (or more) full-time, user-ori- 
ented monitors cannot he overemphasized. 

H1ltz and Turoff"' s observations are confined to computerized con- 
ferencing systems which do not Involve the sophisticated management of 
scientific data. Our experience overlaps theirs 1n the areas of user 
support and 1n electronic mall, the latter being used 1n conjunction 
with but not as part of the R*B development project. The R*B systems 
Interface with ARPANET, a packet switching network Implemented by Bolt 
Beranek & Newman for the Advanced Research Projects Agency 1n 1969, see 
Figure 6. The electronic mall and file transfer protocols associated 
with ARPANET were used extensively. Experience using these systems as 
well as the R*B systems 1s here summarized. The emphasis 1s on the 
natural scientist as a user of computerized systems. As shown 1n Figure 
7, there 1s a wide spead in amount of Individual use. Of the NRDB user 
community, about 80% fell Into the Hght-to-occas1onal category. These 
users tended to disappear unless they were expressly cultivated by R*B 
monitoring personnel. The reasons for their disappearance were various, 
but prominent among them were dislike or fear of computers or failure to 
appreciate the utility of computerization 1n their woifck. The remainder 
of the R*B community was divided Into moderate users (i5%) and heavy 
users (5%). About half the moderate user category tended to move upward 
into the heavy user category. 

Often scientists tend to be curiously ambivalent with respect to 
their own data 1n that they regarded them as both worthless and highly 
proprietary. This behavior 1s the result of fear of preemption or mis- 
use combined with the fact that data are regarded as the raw material 



* ADP = Automatic Data Processing; also EDP = Electronic DP. 
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which ultimately supports publication. It 1s difficult therefore to get 
scientists either to share their data or to store same 1n a rigorously 
accountable manner. We estimate that about 90% to 95% of all basic sci- 
entific observations are so poorly archived as to be essentially lost. 
This follows from the fact that scientists are trained to extract Infor- 
mation from data, not to husband data after the manner of accountants. 
The NRDB with Its data template can, therefore, be regarded as an educa- 
tional tool. Monitoring observations showed that continued use of the 
NRDB Improved both field and laboratory procedure 1n the sense of thor- 
ough and more accountable note-taking. Since verified basic measure- 
ments (as opposed to the reduced data published) tend to have a high 
degree of commonality and to retain their value Indefinitely, any proce- 
dure that archives data 1n exchangable form 1s decidedly cost-effective. 
This 1s especially true 1n the environmental sciences where long-term 
t1me-ser1es analyses are required to detect subtle changes. 

While scientists' customary behavior usually results 1n massive 
data loss, other Important and unreported data sinks are to be found 1n 
the military. The 3-year tour of duty with Its associated name/code 
changes for groups, commands, projects, buildings, bases, &c adds up to 
a thumping loss of corporate memory. The penchant for acronyms does not 
help. Often the basic measurements are still on file but the supporting 
Information necessary for their use has been lost. On the basis of our 
experience 1n sequestering data from various sources, we estimate that 
the half-Hfe of basic measurements 1s less than 3 years 1n the mili- 
tary, between 7 and 10 years 1n the private sector, and between 20 and 
30 years on university campuses. J. Stuart Hunter quotes a National 
Bureau of Standards estimate that 1n 1977 the U.S. government spent $690 
million for data gathering (Hunter, 1980). The cash value of these data - 
losses can, therefore, be Inferred to be significant. Verifiable basic 
measurements are 1n themselves a valuable resource and should be con- 
served. The applications supported by the R*B systems are expressly de- 
signed for that purpose. During the life of this project, the cost of 
computerized data storage has become far less expensive than any other 
means. With appropriately designed data management systems, access to 
data so stored becomes flexible, efficient, and affordable. 

The man/machine Interface continues to receive Insufficient atten- 
tion. Obviously the person who can compose at a standard QWERTY key- 
board has a monumental advantage over those who cannot. The prolifer- 
ation of non-standard additions to that keyboard displays more of the 
American penchant for packaging than of a coordinated approach to user 
needs. These problems, while admittedly beyond the purview of a project 
to develop generalized data management software, are nonetheless felt as 
we canvass our users. Data-Unking reliability 1s a second problem 1n 
this beyond-our-control category, and one that has been so severe as 
nearly to cause the demise of the project. As stated above, strong 
Interaction with the user community 1s paramount, not only as a require- 
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ments source for system designers but also as a means of developing a 
user community while -the data management system Itself 1s being devel- 
oped. With the wide- spread aversion to computers, particularly apparent 
among the natural scientists, low data-Unking reliability or long down 
times 1s the primary cause of user loss mentioned above. The probabil- 
ity of a remote user being able to access his data 1s the product of the 
reliabilities of at least three systems which have nothing whatsoever to 
do wlttf the data* management system Itself. At the least, these are: the 
telephone Unk^Ahe ARPANET link, and the host computer. During a six- 
month mon1tdr+ng period 1n 1981, these reliabilities were estimated to 
be 0.79, 0.91, and 0.58 respectively, for a product of 0.42. In short, 
the data-linker could be assured of reaching his/her data slightly less 
than once 1n two tries. This 1s an admittedly worst case situation 1n 
that we were obliged to use very noisy telephone lines and also our host 
computer (one that Interfaces ARPANET , not the UNIVAC where the data was 
housed) was a severely overloaded machine. To put these figures 1n bet- 
ter perspective, the probability of reaching the correct person on the 
first telephone call should be considered, viz: 0.26 success, 0.10 busy, 
0.28 no answer, 0.28 wrong person, 0.07 mlsdlal or other problem (Wede- 
meyer, 1980). The point here 1s that most users are very tolerant of 
the telephone without realizing 1t, whereas they tend to be extremely 
Intolerant of computerized systems. It should be added that many NRDB 
users are hardwired Into the UNIVAC and therefore enjoy high access re- 
liability. , 



Our data management philosophy" of bypassing "portability" and bring- 
ing the user Into contact with the mainframe computer that can best do 
the job desired requires that these data-Unking problems be solved. 
ARPANET'S recent (01JAN83) switch from NCP* to TCP* and a significant 
upgrading of our host computer has greatly Improved matters. Current 
probabilities are estimated at 0.79, 0.95, and 0.95 respectively, for a 
product of 0.71. This still leaves room for Improvement. The quality 
of telephone lines and the manner 1n which the Itinerant data-linker 1s 
handled by the telephone companies also needs Improvement. The problems 
of the Itinerant data-linker, the one moving about the country carrying 
a portable terminal, seem to be largely neglected by the telephone 
companies. Dialing protocols change with variations from rotary to 
touchtone Instruments and change more confusingly as one moves from 
regional exchange to regional exchange. Directions cannot be convenient- 
ly found 1n the telephone directories, nor can they be obtained from the 
operators, who are trained to give only a limited set of responses. 
These are minor, but nonetheless real, problems which currently cause 
the Itinerant data- linker severe heartburn. The switch from NCP to TCP 
suggests, however, that Interactive data systems, portable terminals, 
and the like are at last coming Into their Inheritance. Thus, the dif- 
ficulties enumerated here should shortly be resolved. 



* NCP = Network Control Protocol; TCP = Terminal Control Protocol. 
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There are, however, more severe problems 1 ; deserving more attention 
than currently seems to be lavished upon them. The business of manual 
writing and machine-tutorial composition needs all the attention 1t can 
get. Arthur Nalman's Introduction to WordStar™ 1s an example of pro- 
gress 1n this direction (Nalman, 1982). There seems to be a need for 3- 
color printing so manuals can distinguish unequivocally user-Input, ma- 
chine-output, and comments concerning the first two Items. 

While the project gives careful attention to the preparation of 
manuals and machine-tutorials, the matter of data structure 1s more 
clearly 1n the province of NRDB support. This 1s a complex, difficult, 
and often neglected field which 1s essential to the establishment of 
practical and efficient databases. A thoughful Inspection of the U.S. 
Library of Congress' call numbers for botanic monographs preserves 1n 
Stone, as 1t were, the pitfalls of Insufficient consideration of data 
structure. We do not claim now to have finalized data structure for the 
NRDB. A few examples, however, are provided to Illustrate the problem 
and our approach to same. NRDB users frequently consider complexes of 
actions to be separate entitles. Consider the complex: 

consul tat1 on/conference/meet1 ng/br1 ef 1 ng/congress/sem1 nar/workshop 
or another suchi ( 

1 nspectl on/1 n ventory/survey/tour/observat1 on-set/ reconnal sance . 
Certainly, there are differences between the elements of these complex- 
es, but are the differences sufficient to require separate treatment 1n 
defining a relation? In our estimation, there 1s roughly 80S functional 
similarity between the elements within each exemplary complex; We have 
attempted to use the concepts of natural language (Sager, 1981) and of 
the selection properties of words (Bloomf1eld,*l933) to assist us with 
these problems. However, careful study of user work habits and contin- 
uous dialogue between the user and the us*r-or1ented monitor appear to 
be the most efficient means of solution. The situation 1s part of a 
larger problem which 1s central toi the success of any database, viz. an 
efficient and rigorous taxonomy. 

to 

The taxondWc codes employed by BIODAB worked beautifully, but they 
were hierarchical and therefore not admlssable Into a generalized rela- 
tional system. All efficient formal taxonomies are hierarchical and the 
problem of mapping such a system Into relational format 1s not a simple 
one. The current TAXON relation 1n NRDB uses the Unnaean binomial/tri- 
nomial system since 1t has withstood the tests and trials of over 200 
years. The system, however, 1s confounded by the fact that botanic us- 
age (Int. Code, 1975) and zoologlc usage (Int. Code, 1961) employ the 
same names for different levels 1n the hierarchy. Worse, the same dis- 
cipline will use the same name at two different levelsl Obviously, such 
practice cannot be tolerated 1n a computerized system. The solution 
currently employed 1n TAXON 1s shown 1n Figure 8. The use of flags 1s 
regretable but necessary. A better solution 1s still being sought dur- 
ing R*B-3 development. Our current solution 1s somewhat mollified by 
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the fact that R*Br3 will support customized applications that maintain 
user-profiles (one user may have several aliases). One or more of these 
profiles can automatically set the taxonomlc flags, customarily employed 
by a given user when he/she logs Into the system. Other dictionary or > 
menu solutions are also possible, but thus far the difficulty remains; 
Non-b1olog1c taxonomies are also hierarchical, thus our TAXON solution 
can be applied to them as -well. An example using a formal taxonomy for 
man-made objects (Chenhall, 1978) is provided 1n Figure 9. Please note 
that cladlstic or evolutionary significance 1s emphatically not Implied 
by these arrangements. They are erected simply for the orderly accommo- 
dation of a wide range of entitles in a computerized database. 

.Costs often loom large 1n administrators eyes when computerized 
databases are proposed. More often than not, these administrators are 
still thinking 1fi terms of the industrial age, as opposed to the In- 
formation age (G1ul1ano, 1982). They view Information handling as es- 
sentially non-productive work and data husbandry as a serendipitous 
pastime rather than as a logical response to a valuable resource. For 
the last decade, the cost of personnel has risen at about 10% per year 
while that of computers and their usage continues to fall at about 25% 
per year. Already the cost of computer storage has fallen to less than 
l/100th that of paper; similar savings are realized on document repro- 
duction. The costs of NRDB support using R*B-2.4 are, of course, mon- 
itored. Data obtained during the fall of 1982 are as foliows: 

* electronic mall ^ $6/hour and falling 

* computer usage $20-$60/hour (depending on the complexity 

of the task attempted) and falling 

* data preparation $0.25-$10/record (depending on the state 

of the raw data and on the complexity of 
verification) - this 1s largely a person- 
nel cost and 1s amortlzable as more re- 
cords are ehtered In 'the same category 

* data entry $0.02-$0.75/record (depending on how 1t 1s 

done; demand or batch, for Instance) 
• * data storage $0.10/ record per year (essentially zero 

1f data 1s archived on tape) 
These costs may seem large to some, especially that of data preparation. 
The higher data preparation costs arise when particular^ messy data 
sets are encountered. Great cost reductions 1n both this and data entry 
costs can confidently be expected as the user community modifies field 
and laboratory behavior to become more compatible with computerization. 
With routine direct-data-entry, costs of less than 2^/record are cer- 
tainly achievable. As said above, the project saw significant changes 
in user behavior as they continued to use the NRDB and the R*B system. 
Thus far," there have been only a few Instances 1n which the NRDB was 
used to prepare a special report. .. .the database is, after all, still 
new. In all those instances, the cost of NRDB preparation was estimated 
to be about l/40th that of doing the same job manually. 
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The Future 

The NRDB supported by R*B-2.4 will continue to be maintained until 
the full -function R*B-3 system becomes operational around JUN85. At 
that time all records (an estimated "6 million) will be transferred Into 
the new data management system. The cost of this transfer 1s expected 
to be minimal since a continuous dialogue is maintained between the NRDB 
database administrators and the R*B-3 system designers. 

As stated above, R*B-3 currently 1s 1n the Definition phase. De- 
velopment will continue 1n three more phases, Design commencing the sum- 
mer *M 1983, Implementation commencing the spring of 1984, and Demon- - 
, stratlon commencing the spring of 1985. In order to obtain Navy appro- 
' val, the project was required to show superior performance and cost ef- 
fectiveness for the proposed R*B-3 system. This was done by comparing 
the existing R*B-2.4 prototype against commercial relational database 
management systems and database machines available in early 1982 (NOSC, 
1982c). R*B-3 will be a generalized, full-function relational system 
compatible with the management of multl disciplinary scientific measure- 
ments. Search rates of at least 2 million records per CPU-second are 
confidently expected. More explicitly, R*B-3 will Rave: 

* a -common query language syntax and grammar for all user 

functions 

* the ability to merge data from different relations Into new 

combinations '•■■■> 

* the ability to support customized Interfaces to the database. 

Including specialized menu formats, application packages' 
such as statistical and graphical analysis, word process- 
ing and document production 

* multi-level security to control access on all levels from a 

relation (primary table) to a single data Item (column-row 
intersection 1n a table) 

* audit trails to monitor access to data and to provide for au- 

tomatic recovery 1n the event data are lost or corrupted 

* greatly Improved efficiency through use of attribute-packing, • 

trigger, and assertion routines currently being defined. 

Finally, 1t should be emphasized that at least the NRDB application 
of R*B-3 will continue to operate 1n the public domain as 1t does now. 
Participation by non-Navy organizations, particularly universities and 
museums, 1s expressly Invited on a pay-as-you-go basis. Private as well 
as other government agencies working 1n environmental fields are Invited 
as well. The overall Intent of this project 1s to capture valuable en- 
vironmental data and preserve them for public use. 

* » 
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ATTRIBUTES OF A 
FULLY DOCUMENTED MEASUREMENT 



M 
E 
A 
S 

u 

R 
E 
M 
E 
N 
T 



D 
E 
S 
C 
R 
I 

P 
T 
I 

0 
N 



VARIABLE - entity on which measurement was 

made, including name & classification, 
state, & components if reported 

QUANTITY - amount of variable measured; 
4 any real number, dimensionless ** 

UNITS - specific magnitude of quantity adopted 
as a standard for comparison with 
other quantities of the same kind 



METHOD - technique by which measurement 
was made 

LATITUDE - first horizontal spatial coordinate 
of measurement 

LONGITUDE - second horizontal spatial coordinate 
of measurement 

ELEVATION/DEPTH - vertical spatial coordinate of 
measurement 

DAY/TIME - temporal coordinate of measurement 

REGION - regional geographic name where 
measurement was made 

INVESTIGATOR - person making measurement 

AFFILIATION - organization with which investigator 
was associated at time of measurement 

SPONSOR - organization which underwrote 
measurement 

HABITAT - environment or surroundings in which 
measurement was made 
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Figure 1. BIODAB Data Template 
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Figure 2. Schematic of Compression Used 1n BIODAB Tuple 
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FIELD DATA 
4 OBS. 
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Fid. Ob*. 
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4 NOSC User* 
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NOAA Data 
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DATA BASE(s) 
computer systems producing data on demand 



Hawaiian Coastal Zone 
Data Base (DB) 
i fDe management system ONLY 



DATA MANAGEMENT 
SOFTWARE 
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Development 
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Analysis of HCZDB content, 
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1st relational DB prototype 



Implementation 
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Figure 3. Schematic History of NOSC s Data Management System Development 



18 



2LE 



TAXONOMIC 



Name 
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Subclass 
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Order 

Suborder 

Infraorder 

Section 

Subsection 

Superfamily 
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Subfamily 

Tribe 

Subtribe 

Genus 

Species 

Subspecies 

Taxon Level* 



Taxon 



Chordata 

Vertebrata 

Pisces 

Osteichthyes 
n. a. 
n. a. 
n. a. 

Acanthopterygii 

Perciformes 

Acanthuroidei 

n. a. 

n. a. 

n. a. 

n. a. 

n. a. 

Acanthuridae 
n. a. 
n. a. 
n. a. 

Acanthurus 
triostegus 
n. a. 
Species 
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Field ata 
Character 
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Acanthurus triostegus = 2DS@BDG@@@@C@@C C@P 



A 



Binary 

Representation 

110010 
001001 

01100 0 
000000 

0 0 0 111 

001001 
00 1 1 00 
000000 

ooooo-o 

000000 
000000 
00 1000 
0 0 0 0 0 0 

000000 
001000 
0 0 0 0 01 
000 000 
010 101 



(in machint code) 



• Taxonomic level indicator; 1 = Phylum, 22 = Subspecies; Level Indicator + 30 - Common Name, + 60 = Synonym 
t TBN = Three-Bit-Nibble, or three bits read as a byte; half a UNIVAC byte 



Figure 4. Taxonomic Code used 1n BIODAB - How Compression 1s Accomplished 
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NRDB Stored Information 
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Lacustrine 

Open Water 
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MAINTENANCE 

Outlease 

Water Wells 
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Fish * Wildlife 
Forestry 
Land 

Landscape 
Special 
Wildland 
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PEDOLOGY 

^ Flood Deposit Soils 
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OBSERVATIONS * SURVEYS 
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v Archaeology 

Pirds 

Coastal Marine 

Forest Inventory 
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Water Table 

Weather 
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Hunting 1 
REGULATED 
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RESOURCE MANIPULATION 
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Military 
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Figure 5. Types of Data Stored in NRDB, 0CT82 
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ARPANET GEOGRAPHIC MAP, FEBRUARY 1982 
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Figure .6. ARPANET Geographic Map, FEB82 




Mean Separation from two Closest Neighbors 
(hours) 



Figure 7. NRDB User Activity, 1981-1982 | 
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*1. KGD Kingdom 

'2. sKG Subklngdom/Category 

*3. PHY Phylum/DI vision 

'4. sPH Subphylum/Subdlvlslon 

5. pCL Superclass 

*6. CLS Class 

'7. sCL Subclass 

8. 1CL Infraclass/P1v1s1onF/D1v1s1onI/Ser1esC 

9. pOR Superorder/Cohort/Subd1v1s1onI/Sect1onI . 
*10. ORD Order 

'11. sOR Suborder 

12. 10R Infraorder/D1v1s1onC/Sect1onC/Tr1beC/Tr1beI 

13. pFM Superfamlly/SubdlvlslonC/SubsectlonC/SubtrlbeC/Subtrlbel 
*14. FML Family 

'15. sFM Subfamily 

16. 1FM (Infrafam11y)/Contr1be/D1v1s1on0 

17. TRB Tribe 

18. sTR Subtrlbe/SectlonG/SerlesO 

19. pGE (Supergenus)/Subser1es0 

*20. GEN Genus , . 

'21. sGE Subgenus \ 

22. STN Section 

23. SER Subsection/Series 

24. pSP (Superspec1es)/Subser1es 
*25. SPC Species 

'26. sSP Subspecles/VarJLety/Breed 

*27. TAX Binomial (GEN + SPC) or Trinomial (GEN + SPC + sSP) or 
Varlety/Breed/Form/Race/Cultlvar/Cross & their subs, 
In short the specific entity 

*28. LVL Taxon Level 

*29. AUT Authority 

*30. DAT Date 

*31. STS Status 

*32. VID Vide 

*33. CMT Comnients 

*34. UPD Update 



Notes: 

attribute types - 

* ■ primary taxonomlc Jevel or master attribute status 

' ■ secondary taxonomlc level 
* sliding taxonomlc level or little used taxonomlc level 
attribute flags - 

C ■ crabs, F ■ fish, G ■ grasses, I ■ Insects, 0 ■ orchids 



Figure 8. Schema for a Generalized Taxonomlc Hierarchy 

(RELATABASE-3) 
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Figure 9. An Application of the Generalized Taxonomlc Schema 

* (RELATABASE-3) - 
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